Finetune (OpenAI) (Generative Models)

Synopsis

Finetunes an OpenAI model to your data.

Description

This operator starts a finetuning job on OpenAI. It will first check the data for correctness and will estimate the expected price. If the price is below a limit specified as a parameter, the data will be uploaded and a new finetuning job will be started. The output of this operator is therefore not a model but a data table with some job parameters. The most important one is the job ID. This ID can be used to query the status of the finetuning job with the Check Job Status operator which will also deliver the model ID after the finetuning has successfully finished. Please note that OpenAI will also automatically deploy the model so you can use the model ID as parameter for the Add Column operator instead of using one of the usual OpenAI models. Finally, OpenAI will also send you an email to the address associated with this organization after a finetuning job has finished. If you select Azure as type, you will need to provide a connection to Azure Open AI instead. This dictionary connection needs to contain the 'api_key' as well as the 'api_base_url' of your Azure OpenAI environment. Important: Unlike models finetuned on OpenAI, the model will not be automatically deployed in Azure OpenAI. Please deploy the finetuned model in the web interface after the finetuning has finished. Please refer to the Azure OpenAI documentation for additional information.

Input

data (Data table)
The training data for this finetuning with at least two columns for the inputs as well as the expected results.
connection (Connection)
A Dictionary Connection providing the API key as a key value pair with a key named 'api_key'.

Output

data (Data table)
A data set describing the finetuning job (including the job id which you will need for fetching the model id of the result).
connection (Connection)
The input connection.

Parameters

type Indicates if this operator should use an OpenAI model or a model hosted by Microsoft Azure.
model The model to be finetuned. Please note that only foundation models can be finetuned and finetuning of finetuned models is not supported.
input column The name of the attribute or data column which should be used as input for this model.
target column The name of the attribute or column which should be used as the target for this finetuning. The target is the desired answer from the model for the given inputs.
system prompt A system prompt which can be used to initialize the model or embody a specific persona.
epochs The number of epochs for this finetuning. Values between 4 and 15 typically deliver the best results.
check price limit Indicates if a price estimate should be calculated before finetuning is started and if the training should be aborted if the estimated price exceeds the define limit.
price limit Before finetuning is started, this operator will estimate the expected total price. If that estimation exceeds this limit (in USD), the finetuning will not be started.
conda environment The conda environment used for this downloading task. Additional packages may be installed into this environment, please refer to the extension documentation for additional details on this and on version requirements for Python and some packages which have be present in this environment.

Tutorial Processes

Finetune ChatGPT to be sarcastic

We first generate a simple data set containing only ten questions for different capitals together with ten somewhat sarcastic answers. This data is delivered to the finetuning operator which also gets a system prompt asking it to be more sarcastic. The finetuning operator then uploads the data to OpenAI and starts a finetuning job. The job ID is delivered as output of this operator. Please take note since you will need the job id to query for the finetuning job status as well to get the model ID which is needed to use the finetuned model later with the Send Prompt operator. IMPORTANT: you will need to provide your own API key as Dictionary Connection for this process to run. Please refer to the documentation for additional information.

Categories

Versions